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[57] ABSTRACT 

A method for verifying computer generated data in periodi- 
cally updated and replaced files to determine if data item 
characteristics in the files have changed in an unexpected 
manner. The method involves the steps of selecting a first 
version of each of the data item characteristics and selecting 
a second subsequent version of each of the data item 
characteristics. The first version of each of the data item 
characteristics and the second subsequent version of each of 
the data item characteristics are analyzed to produce first and 
second statistical profiles. The first and second statistical 
profiles of each of the data item characteristics are then 
compared to each other to determine if any of the data item 
characteristics have changed in an unexpected marmer. 
Finally, the files being periodically updated and replaced are 
monitored to determine if the data item characteristics in the 
files have changed in an unexpected manner. 
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WARNINGS AND NOTinCATIONS 



24 

RLE IDENTinCATION 



RLE NAME (DESCRIPTIVE NAME): 
DSN (USE "(0)" FOR GDG's): 



26 

RECORD LAYOUT: 



PROGRAM nif/COPY UBRARY DSN: 
COPY MEMBER (WHERE APPUCABLE): 
UBRARY TYPE ("P" PANVALET, BLANK FOR OTWER RLES 



28 

IDEMTinCATlON FOR RLES WITH MULTIPLE RECORD LAYOUTS: 



RECORD IDENTIRER DATA rTEM NAME: 
RECORD IDENHRER DATA ITEM VALUE: 



30 

CONTROL INFORMATION: 



SYSTEM GENERATED RLE ID: 
CONRRM THE CHANGE OF RLE DSN (Y): 
CONRRM DELETION (D): 



FIG. 2C 
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COL l! CHARACIERlSnCS ID 

COU 2x lYPE OF DATA WHOSE "CHARACTCRISTIC" IS REPRESENTED. "0"= GROUP CHARACTERISnC 
(All INSTANCES TAKEN TOGETHER). T = TEXT, V -NUMBER. OR V -DATE 

COL 3; THE WAY IN WHITCH THE CHARACTERISnC IS REPRESENTED. V =A ACOUNT OR V =AN 
OBSERVATION. 

COL 4: TYPE OF DATA USED TO REPRESENT THE CHARACIERISna "C-A COUNT» T =A TEXT 
STRING. N «A NUMBER. OR D DATL 

COL 6: CHARACTERISTIC PROPERTY (THE WORDING USED HERE IS THE SAME AS IN USED IN THE 
MONITORING REPORTS - REFER TO THE SAMPII REPORTS). 

COL 6: CHANGL THE CODE INDICATES A "LEVEL OF IMPORTANCE*' WHICH CORRESPONDS TO A 
"DEGREE OF CHANGE*" CODE WFTH A VALUE OF "2". 

COL 7: 'QUANTnY CODE*' FOR MEASURING THE "lEVEL OF IMPORTANCE*' OF AN ABSENCE OF 
CHANGE". 

COL 8: THE "LEVEL OF IMPORTANCE*" OF AN ABSENCE OF CHANGE". 

COL 9: APPEARENCE OR DISAPPEARANCE OF CHARACTERISTIC WHEN COMPARED TO THE PREVIOUS 
GENERATION OF THE RLE, SHOWN AS A IfYEL OF IMPORTANCE*" CODE. 

♦ THE VALUES OF THE CODES, AND HOW THEY ARE USED. IS EXPWINED IN THE "DETAILED DESCRIPTION 
OF THE INVENTION". 
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METHOD FOR DETERMINING IF DATA enor. In the case where the process is affected by the enor, 

ITEM CHARACTERISTICS IN it will either notify the user of a problem in a controlled 

PERIODICALLY UPDATED AND REPLACED fashion (if the possibility of that type of error was foreseen) 

FILES HAVE UNEXPECTEDLY CHANGED or the process will be forced to a halt (when the error is of 

FIELD OF THE INVENTION ^ ^ unforeseen nature). The error in the data may also go 

undetected allowing the process to continue to completion. 

The present invention relates generally to data processing so that the incorrect data will not be immediately obvious, 

and more particularly to a method for verifying computer ^here are many ways in which errors can be introduced 

generated data to determine if penodically updated or ^ ^^^^^^ introduced 

replaced files have data items which have changed m an ,o into computer data from "bu^» in the computer program, 

unexpected maimer. ^ ^ ^ , r f / 

^ from external sources, from the operatmg system s 

BACKGROUND OF THE INVENTION environment, and from errors caused by the computer itself, 

The volume of information that is processed and stored by j^^^ ^ 

computer systems continues to expand at a remarkable pace With regard to data errors which originate from bugs in 

with "desktop" personal computers and other small com- computer programs, virtually all nontrivial computer pro- 

puter systems forming the most visible component of this grams contain some bugs. Careful design and exhaustive 

growth. Most large corporations, however, still rely on testing will typically identify most of the bugs, but some 

maiDfirame systems for most of their basic data processing ^^gs will undoubtedly remain latent in any system, ready to 

needs, even though the smaller systems have become faster affect the process when some new combination of circum- 

and include computer storage media which can accommo- stances arises in the data. Systems made up of suites of 

date more data than in the past. This is because mainframe programs that work together, are prone to bugs in exacdy the 

systems still hold a substantial advantage over small com- same way, since such software systems are io effect just 

puter systems in terms of speed, volume of storage, and laige programs. 

above all, capacity for large volume throughput. With regard to data errors which originate from external 

Accordingly, mainframe systems continue to meet data sources, computer systems which obtain information from 

processing requirements that the smaller computer systems outside sources are subject to errors from unexpected 

cannot match. changes in the data from those external sources. Although 

The proliferation of personal computers in the mass program bugs are often blamed for such eirors, many times 

market has forced publishers of personal computer software 30 these errors result from a failure of the personnel who are 

to improve their products, making data on these small responsible for the system which produces the data to 

madiines easier to access. But the benefits realized in the communicate with the persoimel who are responsible for the 

mass market in terms of improved personal computer system which receives the data. 

software, have not been seen in the area of mainframe As stated earlier, data errors can also be caused by the 

computer software despite the fact that mainframes, and 35 system environment IBM's Multiple Virtual System (MVS) 

their associated software systems, have been around for far operating system may be responsible for more large scale 

longer. HeiKe, data in mainframe systems is often far more batch data processing than any other system software, 

difficult to access than data on personal computers, making Unlike personal computer software which "crashes" 

it harder to see the results of a computer process. One of the frequently, MVS installations, which typically support hun- 

main reasons mainframe data is more difficult to access is dreds or even thousands of simultaneous batch and ordine 

due to the nature of the processing done on these differently processes, "crash" very rarely. When a MVS operating 

sized hardware platforms. More specifically, the batch data system does crash, the crash is usually confined to individual 

typically processed by mainframe systems is far harder to processes or subsystems. However, MVS does have some 

access than the online data typically processed by personal serious limitations which relate to job control language 

computers as will be explained below, 45 (JCL), the programming language that links programs to the 

Data processing can be divided into two classes: online data that the programs access. The JCL is difficult to test 

and batch. Online processing is geared towards the imme- since it has limited parameter substimtion and inadequate 

diate resolution of individual transactions, whereas batch features for process modularization. MVS also has an inflex- 

processing handles large quantities of transactions as a ible storage allocation scheme, which requires that storage 

group. Human interaction with computers is invariably 50 requirements be determined in considerable detail in 

through online processing, while large scale processing is advance. In addition, MVS tends to require a great deal of 

most often handled in the batch mode. manual (operator) intervention. 

Since batch data processing involves large quantities of With regard to ''computer enors," all such computer 
data, the detection of errors in the data involves examining errors result either from hardware failures, or manual mis- 
large amounts of the data. In online data processing, 55 takes. When computer errors slip through undetected, they 
however, each item of information or data results, at least in are generally manual in origin. 

part, from an interaction with a person and thus, errors in the Present computer data error detection methods are gen- 
data are more easily and likely to be detected. This personal ©rally geared towards ensuring that data moved from one 
interaction or "manual oversight" provides a degree of place to another, arrives intact. This is generally accom- 
quality control. It should be noted, however, that large scale 60 plished by creating some kind of redundant representation of 
manual data entry may be regarded as a ^tiatch" process in the data, and using the extra information to compare the 
this context. Although the data is processed though human origiDal data to the copied version. However, such methods 
interaction, the processing is nonetheless mechanical io cannot detect errors in the original data. More specifically, 
naUire since data entry clerks generally do not read what errors created by software bugs are not detectable by present 
they are typing. 65 methods because such errors originate in the program itself 
In any case, when batch systems encounter undetected and not in the failure of the hardware to correctly execute the 
errors in the data, the process may or may not respond to the program instructions. 
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It is, therefore, an object of the present invention to 
provide a data verification method for detecting errors which 
have been introduced throughout the entire computer sys- 
tem. 

5 

SUMMARY OF THE INVENTION 

A method for verifying computer generated data in peri- 
odically updated or replaced files to determine if data item 
characteristics in the files have changed in an unexpected 
manner. The method involves the steps of selecting a first 
version of each of the data item characteristics and selecting 
a second subsequent version of each of the data item 
characteristics. Tlie first version of each of the data item 
characteristics and the second subsequent version of each of 
the data item characteristics are analyzed to produce first and 
second statistical profiles. The first and second statistical 
profiles of each of the data item characteristics are then 
compared to each other to deteraiine if any of the data item 
characteristics have changed in an unexpected manner. ^ 

BRIEF DESCRIPTION OF THE DRAWINGS 

A more complete understanding of the invention may be 
obtained from consideration of the following detailed 
description in conjunction with the accompanying drawings 25 
in which: 

FIG. 1 is a data flow diagram of the data verification 
method of the present invention; 

FIG. 2A is a data flow diagram of the file information 
maintenance step as a batch process; 30 

FIG. 2B is a data flow diagram of the file information 
maintenance step as an interactive online process; 

FIG. 2C is a flow chart depicting a preferred approach for 
performing the manual online maintenance task as it relates 3^ 
to the online file infisrmation maintenance processing step of 
FIG. 2B; 

FIG. 3 A is a data flow diagram of the data definition 
processing step of the present method; 

FIGS. 3B and 3C depict two sample report pages pro- 40 
duced by the data definition processing step of the present 
method; 

HG. 3D depicts a second sample report produced by the 
data definition processing step of the present method, which 

identifies data definition changes since the last file analysis; 

FIG. 4A is a data flow diagram of the file analysis 
processing step of the present method; and 

FIGS. 4B and 4C show a table of statistics that are 
collected in the first task and reported in the second task of 
the final analysis processing step of the present method. 

DETAILED DESCRIPTION OF THE 
INVENTION 

The data verification method of the present invention 55 
applies ^'reasonableness checking" to the data throughout the 
system. "Reasonableness cheddng" operates to identify 
gross errors and unreasonable results in computations. 
Although most program or software system bugs result in 
"gross" errors, such errors are often hard to find in vast 
computer files. 

Computer files consist of multiple records which are 
divided into fields, where each field represents a data item. 
Conceptually, each file consists of a table with rows and 
columns (which is in fact the terminology of the relational 65 
database discipline). Generally, batch computer processes 
tend to operate in the same way on every record of the same 
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type, and in the same way on each item in a column of a file, 
although there are exceptions to this. As a result, program- 
ming bugs tend to produce errors which propagate through 
many records. Although these computer errors tend to be 
enormous, such errors are often lost in the even more 
enormous volume of data being processed in any large 
mainframe system. The method of the present invention 
makes it possible to find the erroneous data in the large 
number of records that are affected. 

In order to measure the reasonableness of data in a file, the 
verification method of the present invention establishes a 
standard or **baseline" of reasonableness for every data item, 
in every file that is to be verified. Such a "baseline" is 
established in the present invention by picking any version 
of the data in a system, and using that as the yardstick for 
evaluating the next generation of the same data. When in 
use, the verification method of the invention produces sta- 
tistics for eadi data item and compares the data to the 
statistics produced for the previous generation of the data. 
Although the individual data item instances will change 
substantially, statistics representing aU of the instances of a 
data item in a file, are far more stable. Generally, only a 
small percentage of the data items in a file exhibit a radical 
change fi^om period to period. Accordingly, the verification 
method of the present invention reduces the number of data 
items in a system that will need special attention to a 
manageable quantity. 

The present invention evaluates the contents of computer 
files. It employs a generic approach which is driven by 
record descriptions (a.k.a, record layouts) whidi may be 
created for use in programs which read firom or write to these 
files. This software may be used to profile the contents of 
files, monitor changes, detect likely areas of erroneous data, 
generate data domain mela-data, and verify "migrated" 
information in parallel implementations and similar uses. 

The generic design of the present invention eliminates 
programming errors inherent in customized solutions to the 
foregoing problems, allows for the immediate implementa- 
tion of solutions to said problems, and provides for a 
thorough evaluation of all file contents. 

At its core, the present invention consists of two main 
parts. The first part compares record layouts over time to 
determine if they have changed in ways that would affect the 
contents of files. The second part performs a generic data 
item evaluation that obtains a description of the contents of 
every data item that is identified in the record layouts (alc.a. 
data item characteristics), and compares these characteristics 
over time (where historical information is available). 

The complete methodology involves two more peripheral 
components. One is the maintenance of a file information list 
that defines the set of files that are to be evaluated, and 
identifies their record layouts. The second is a process which 
monitors the results of the other steps. This may vary in 
sophistication from simply reviewing printed reports pro- 
duced in the other processing steps, to accessing the same 
information via an online system that is updated in real time. 

The present invention is intended to be primarily used as 
a tool for applications developers, systems managers, data- 
base administrators and the like. In this regard it is mainly 
a tool for systems professionals rather than an end-user 
application. The present invention has many practical uses 
including, but not limited to, monitoring the accounts of 
financial institutions such as banks or brokerage bouses, or 
monitoring the accounts payable or accounts receivable of 
businesses, or monitoring the processing of insurance claims 
by insurance companies. 
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An exemplary embodiment of the verification method of In other embodiments of the present method, an online 

the present invention will now be described as it applies to process monitoring "ticker" application can also be pro- 

an IBM mainframe world, as used for the IBM MVS and vided. Such an online process monitoring step would alert 

Time Sharing Option (TSO) systems. Accordingly, the users to anomaHes in the data and provides for easy "drill 

description which follows will refer to the way data and 5 down" access to the more detailed information, through a 

processes are handled in those IBM systems. Programming ^lore convenient interface. 

language references will be specific to COBOL unless ^ , , , • /r j r^,, 

otherwise noted. It should be understood, however, that the ^ described earher, the informaUon provided by the FIM 

verification method of the present invention as will be processmg step 10 tells the rest of the processmg steps of the 

described is a generic process which can generally be verification method how to mterpret the files that are to be 

implemented in any software environment. More verified. The FIM processing step 10 provides one entry per 

specifically, the processing steps used in the present method consisting of rudimentary identifying information for the 

as win be described, can be used in the same way for a ^ vccord layouts. The file is identified by a 

variety of data storage or software schemes. Since the desca-iptive name and a formal identifier. The record layout 

processing steps of the present method are generic, the may require slighdy more complex identifying information, 

process steps do not have to be modified for each file Referring to FIGS. 2A and 2B, collectively, data flow 

because the steps modify themselves, thereby functioning diagrams further detailing the FIM processing step of the 

more reliably and objectively. Reliability in a data verifica- present method are shown. In particular, the data flow 

tion method is vital since the process steps should not diagram of FIG. 2A embodies the FIM processing step for 

themselves create errors. Furthermore, objectivity is also batch data processing while the data flow diagram of FIG. 

important in a data verification method because errors must ^ 2B embodies the FIM processing step for online data 

be checked both in places where you would expect them to processing. As shown in both FIGS. 2A and 2B the FIM 

occur, and in places where you would not expect them to processing step consists of the task 18, 18* of manually 

occur. maintaining the file information referred to herein as online 

Referring to the data flow diagram of FIG. 1, the basic maintenance, and then the task 20, 20' of assigning a unique 

processing steps of the verification method of the present ^ consistent File ID to the manually maintained file 

invention are shown. As can be seen, the verification method information referred to herein as parameter file creation. In 

is divided into four steps which consist of a file information FIM processing step for the online version of FIG. 2B, 

maintenance (FIM) processing step 10, a data definition the first task 18' of the step (manual maintenance of the file 

(DD) processing step 12, a file analysis (FA) processing step information) invokes the second task 20* (assigning FUe IDs) 

14, and a process monitoring (PM) step 16. each file information record is maintained. However, in 

Ibe FIM processing step 10 maintains a minimal amount ^^^^ versions of the FIM processing step of FIGS. 2A and 

of file information that tells the verification system which objective is the same, to maintain a consistent means 

files arc to be analyzed, and where to find the information of identifying files, so that the files may be compared across 

that defines the contents of each file. Although this step is 35 

performed manually, it involves only a small amount of A preferred approach for manual online maintenance task 

information for each file. Furthermore, such data is likely to 18' as it relates to online file information maintenance 

change very little firom one period to the next. processing step of FIG. 2B is shown in the flow chart of FIG. 

The DD processing step 12 derives data item information 2C. As can be seen, the first box 22 of the flow chart 

from the file information maintained in the FIM processing 40 represents the warnings and notifications which are provided 

step 10. More specifically, the DD processing step 12 to a user in response to said user's actions, such as error 

determines how each file should be analyzed, as well as the messages, confirmations of changes and notification of 

variations in the file definitions since the last run. A complex deletions. 

data processing system may contain thousands of separately The first group of items which the online entry procedure 

defined data items. The DD processing step 12 separately 45 prompts the user for are the "File Identification" items 24 

reports just those items that are to be changed for the next such as the "File Name (Descriptive Name)" and "DSN (Use 

file creation/update. Such a report may be reviewed for any "0** for GDGs).** The "File Name (Descriptive Name)" is the 

unexpected changes. name of the file in plain language. It serves as an essential 

The FA processing step 14 is executed for each file as soon piece of system documentation. The "DSN (Use "0" for 

as possible after it is created or updated. This step evaluates 50 GDGs)'' is the "data set name'' and is the "formal" name 

the aggregate statistics on each data item, and responds used by the method to "catalogue" the file. The "File ID" 

appropriately to the changes (or lack thereoQ as directed by that is subsequently assigned to each file reference is based 

the processing control parameters. primarily on the DSN. The reason for substituting a numeri- 

The PM processing step 16 monitors the verification ^^^y in place of the DSN is mainly as a space and time 

process as it proceeds by receiving the information in the 55 saving measure. A DSN (on the MVS system) can be 44 

reports which are maintained in the FAstep. The information bytes long, the binary padced numerical file ID occupies 

thus gathered is posted to a set of 3 online reports (consisting ocly 2 bytes. Another reason for using a File ID alias 

of a serious anomalies report, a significant variations report involves the situation where a file's DSN has to be changed, 

and a detailed information report. Each report shows the In such a situation, the File ID can be reassigned indepen- 

processing timeline, interspersed with a record of what the 60 dently of the DSN, thus maintaining the continuity of 

data looks like and how it has changed. The three reports references across file generations. If the file is a generation 

differ in the level of detail reported about each process). The data group (GDG) the user will follow the DSN with "(0)" 

staff monitoring the file processing only have to watch the to indicate the current version. Entering the DSN of an 

serious anomalies report, using the more detailed reports to already specified file entry will cause that entry to be 

resolve specific issues. The tools provided by the present 65 retrieved for maintenance purposes, 

method simplify the task of process monitoring which must The next group of items which the online entry procedure 

be performed in any event prompts the user for are the "Record Layout" items 26 such 
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as "Program Fik/Copy Library DSN" "Copy Member Mth regard to the all-online embodiment of FIG. 2B, the 

(Where Applicable)," and "Library Type ("F'-Panvalet, user is provided with the option of reassigning File ID 

blank for other files)." The "Program File/Copy Library numbers in the (unusual) circumstance of a DSN change. 

DSN" is.lhe DSN of the file contaming the record layout ^^^^^^ 3^ ^ ^^^^ ^ ^^^^^ 

mformatiOQ, The « Copy Member (^^^ Applicablcr is a 5 i^g the DD processing step of the present method is shown, 

membername which further qualifies the record layout. This ttT^ r^n * *u at c ^ • j 

is generally required in most cases since the file Vill most processmg step uses the file informaUon obtamed m 

probably be of a "library" structure. The "Ubrary IVpc ^^^^ ^^^"^^^ f^^^^J^ ^^^^ '^^"^ 

CT».Panvalet, blank for other files)" are codes which indi- ameters used to perform the subsequent processing step 

cate a third party "library** maintenance system such as analysis. The DD processmg step ensures that data 

Panvalet. If omitted, a default "l&rary" type of "partitioned correcUy matched across time in the same way the 

data set" (PDS) wiU be assumed. processing step ensures contimiity of file references 

Ibe next group of items which the online entry procedure P^^^^ ^° ^® 

prompts the user for are the "Identification for files with The DD processing step involves: the task 32 of building 

multiple record layouts" items 28 such as "Record identifier a job to compile record information; the task 34 of executing 

data item name" and "Record identifier data item value." the job to compile the record information; the task 36 of 

The "Record identifier data item name" is the name of the creating a new data item parameter file; and the' task 38 of 

data item, common to each record layout, that contains a finalizing the data item parameter file, 

value used to identify which record layout describes the xhe first two tasks, 32 and 34, of the DD processing step 

current record. It is used in those cases where Uie file has essentially involve gathering the record layout information 

multiple alternative record Uyouts. The "Record ident^er 20 j^^^ ^.^^ 3^ ^^^^ organizes this 

data item value" is the vahie(i.e. contents) of the named data . . a * u a * % 

•* *u * J -* ui • * *t. *!. 1- data mto a uniiorm Structure and matches data item names 

item that identifies it as belongmg to the record that has been •jTn.i^.iiof*!. rxi-. 

identified above across time penods. The last task 38 of the DD process 

™ , . ' C 4 j.-u*u 1- J. "finalizes" the data for the file analysis process. 

The last group of Items \^ich the onkne entry prooediire . ^ . ^ ^ r \ 

prompts the user for are the "Control Information" items 30 25 ^^^ard to the first task 32 of the DD processmg step, 

such as "System Generated File ID," "Confirm the change of ^^'^ ^® number of record layouts which have to be 

File DSN (Y)," and "Confirm Deletion (D)." The "System analyzed in this process. Furthermore, the record layouts 

Generated File ID" item is an alias for the DSN and as such be stored in a variety of different ways, in a variety of 

may be used to retrieve an entry that requires maintenance. fil^ formats, or in proprietary "library" maintenance prod- 
Additionally this item may be specified in conjunction with 3Q nets. The record layouts may be embedded in program code, 

the DSN in order to specify a DSN change. The "Confirm The first task 32 of the DD processing step involves using 

the change of File DSN (Y)" is used with the "System the record layout identifying information, as entered in the 

Generated File ID" field described above and enables a user piM processing step, and assembling the record layouts into 

to specify that a DSN is to be changed for a file whose File a single file. This is accomplished by building a separate job 

ID number is specified. In most cases the DSN serves as the that assembles the record layouts into the single file. Addi- 

primary means of identifying a file, whereas the system tional job control statements are contained in the "Job 

generated File ID serves as a system generated alias. Components" file(s). These "Job Components" can easily be 

UsuaUy, the ast thmg that one would want to change about ^^^^^^ ^ ^^^^ processing standards of any particular 

any file is it s DSN. However, this data entry item, when , • f rf 

used with the "System Generated File ID" field (see above) tiaia processmg racnity. 

wiU enable users to specify that a DSN is to be changed for 40 The job, as constructed above, is then executed in task 34 

a file whose File ID number is specified. The "Cbnfirm produce in one file a complete listing of all of the record 

Deletion (D)" allows usere to delete file information entries. layouts, interspersed .with file header records. Each record 

The preferred approach for manual online maintenance ^^y^^^ ^ identified with its. file since the same data item 

task 18 as it relates to performing the batch file information can easily appear in more tiian one file. If record 
maintenance processing step of FIG. 2A is identical to the 45 layouts cannot be found, the process is stopped and the 

approach described above for the online file infomiation appropriate error message(s) inform the user that record 

maintenance processing step of FIG. 2B, except that the layouts cannot be located. When this occurs, the user is 

"Warnings and Notifications" 22 and "Control Information" instructed to correct either the record layout (which may be 

items 30 are omitted. in the wrong place) or the file information itself It should be 

ReferringagaintoFIGS. 2Aand2B, thenexttask20,20' 50 understood, that although this method for assembling the 

of the FIM processing step involves assigning a unique and record layout information is preferred, other methods for 

consistent File ID (parameter file creation) to the manually assembling the record layout information are contemplated 

maintained file information. The system assigns a File ID present invention. 

based on the DSN. A master Hst of DSNs is maintained by In the third task 36 of the DD processing step, the source 
the process and is hereinafter referred to as the "File 55 record layouts are interpreted and the record layout param- 
Identification Gross Reference" file in FIGS. 2 A and 2B. eters are stored in a standardized format. This task in many 
New DSNs are added to this list and are assigned the next ways mimics the work done by a computer language inter- 
available number. File ID numbers are assigned incremen- preter or compiler. The third task 36 also compares the new 
tally starting with 1. Using a 4 byte binary integer will data item list to the previous data item list which is stored in 
provide for a billion unique DSNs. File ID cross reference 60 the "Parameter History" file. In the third task 36, the new list 
items are never deleted, so that if a file is removed from the is not added to the "history" at this point, thereby allowing 
list and then later re-added, it will regain it's former unique for the process to be reviewed, corrected and rerun. 
ID number. A date stamp is preferably added to the File ID FIGS. 3B and 3C depict two sample report pages pro- 
cross reference items at the same time they are assigned a duced by the DD processing step of the present method. As 
number for auditing purposes although in other embodi- 65 can be seen firom these two reports, the data that is derived 
ments of the present method, the date stamp can be omitted from the record layouts follows the format required for 
if desired. COBOL programming. It should be understood, however, 
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that similar information can be derived from code used in The term "Type** refeis to the type of data representation, 

other data processing languages, such as "declarations" in Character data is always "D" for display. Numbers may be: 

PL/I, "formats" in SAS, "unpacl^' statements in perl, and the "D" for display (including zoned decimal); "P" for packed 

like. FIG. 3D depicts a second sample report produced by decimal; "B" for binary; "1" for a 4 byte internal floating 
the DD processing step of the present method, which iden- 5 point (COMP-1); and "2** for an 8 byte internal floating point 

tifics data definition changes since the last "file analysis." (COMP-2). 

The items which make up the report of HGS. 3B and 3C The term "Digits" refers to the number of digits which are 

are described hereinafter. Every data "item" is assigned an accommodated (i.e. overall numeric precision), 

item number which remains constant from period to period. jhe term "Decimal Shift" refers to the equivalent of the 

(Note that theitemnumbersthatappearontheieport,are for lo p^^^^ ^^^^ ^^^^ ^ multiplied by. A 

reference only.) Only items with a non-zero length, as shown negative amount moves the decimal point to the left, and a 

m the last column, are stored. ^^^^^^ ^^^^^ ^^^^^ ^^^^^ ^^^^ 

The term "Lvl" is the COBOL level number. The actual decimal shift is omitted for integers, 

level number is important since it may form part of the The terms "Sign" and "U" refer to unsigned items, and the 

definiUon of a subset of a record layout that is to be used to ^^^^^ ^ .^^^^ ^ 

define a file's structure. (Some COBOL compilers reassign ^m. . ut^ . t ^, ^ , ..t 

levelnumbersinthecompilersinceonlytheirrelativevalues ,,^^P,^'^ Depend Item #- refers to the Item number 

are normally of importance in defining a data structure.) J^^^^ "^^^^^^ T^^' «^ 

The term "Name" is the primary identification of data ZZ'TAnl^L^f^ occurrences of the data item 
;f^«,c «r^fi,;„ « ««j .u f * • -4 T* 20 shown. As m the case of the "Item column, the numbers 

U^h^,? .1 1 H^' T ' T fr. . that appear on the reports refer back to the "I em" numbers 

|s subscquendy replaced m the process with an ahas, the data ^ J^^^ column The recorded data item history will 

Item number, to save space and improve performajice. If a ^ 
data item s name is changed, this will be handled as a 

deletion of the original item, and the addition of a new item. ^^^^ "^^^^^ ^^^^^^ ^ position in bytes of 

It is possible to provide an override mechanism to force a ^ ^y^^ ^® ^^^^ ^plicit record size infonna- 

renamed item to reclaim the ID associated with its original variable sized records, is not induded. Thus, "Start 

name, but the benefit of such a mechanism is questionable. Pos" and "Depend Item #" are mutuafly exclusive by 

Data item names are rarely changed in batch processing, definition. 

without there being some accompanying change in the way Th^ term "Len" refers to the length in bytes of the data 

that the data is being handled, which means that the data i^^oi- 

item will require special attention in any event. Note that the The "New Data Item Characteristics" as used in the report 

data item names shown in the sample reports have substi- in FIGS. 3B and 3C, contain data item ID numbers carried 

tutable qualifiers ("&F1LE"). Unlike a COBOL compiler, forward from a previous report. All discrepancies between 

these qualifiers are processed as they appear in the source. the new data and the previous period data are Hsted in the 

This is important since this makes the data item name an "Data Item Change Report", as shown in FIG. 3D. A single 

absolute reference. Once they have been replaced by a "New" item represents an "addition," A single "Old" item 

substitute, the original name is lost. FUler items and slack represents a deleted item. An "Old/New" matched pair 

bytes are noted explicitly with names "FILLER" and represents a change to an item. Name changes are treated as 

"SLACK-BYTES", respectively. Internally, and in subse- a deletion with a matching addition. Changes to FILLER and 
quent reporting, the data item names are difiGerentiated from ^ SLACK-BYTES items are recorded for purposes of analysis 

one another by the addition of the start position and length but are not shown on the report. Note that the columns of 

(in bytes) of the item. For example: "SLACK-BYTES FIG. 3D are virtually identical to those in the report of FIGS. 

(123:3)" 3B and 3C. The major difference is that the "item" number 

The term "Picture" refeis to the COBOL picture retained in the first column of the report of FIGS. 3B and 3C is 

in order to provide additional data item documentation in the replaced in FIG. 3D with an "Old/New" indicator, 

various reports. Reviewing the "Data Item Change Report" of FIG. 3D 

The terms "Occurrence" and refer to the occurrence provides a user with the salient features of the record 

number of a repeated data item. layouts, by showing only the changes. The report of FIG. 3D 

The terms "Occurrence" and "of* refer to the total number 5Q is likely to be far smaller than the "Complete Data Item List" 

of occurrences. report of FIGS. 3B and 3C which have an entry for each data 

The terms "Occurrence" and "Dpth" refer to the fact that ^^^^ defined for each file, and is likely to contain hundreds 

in COBOL, repeated data items may be nested to 7 levels. of pages in a typical implementation. 

The number shown under this heading indicates the current The entire method from FIM processing step 10 through 
level of "OCCURS" nesting. 55 the tiiird task 36 of the DD processing step may be rerun as 

Occurrence numbers form part of the data item key. For necessary until the user is satisfied that the file information 

COBOL> this means that each data item has up to 7 parameters and record layouts are correct, 

additional numbers whidi together identify occurrences of a Referring again to FIG. 3A, the final or fourth task 38 of 

data item, each of which will be assigned a separate data the DD processing step finaUzes the information as is 
item ID. Note that in the report of FIG. 3B, each occurrence 60 described below. The fourth task 38 of the DD processing 

is listed separately. step involves using the "New Data Item Parameters" for 

The term "Nbr/Chr" relates to "N"«number data, or actual file verification, once the data is correct. In particular, 

"C'=character data, as indicated by the "picture" or "usage" the fourth task 38 of the DD processing step "stamps" each 

clause. Group items are noted explicitly. Group items are data item parameter with the new "production" date, and 
only of interest in defining the data structure but are not used 65 adds the new parameter information to the "Data Item 

in the remainder of the process since they do not tiiemselves Parameter History" (thereby creating the "Updated Data 

contain data. Item Parameter History" as shown in the diagram). The 
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"Finalized Dau Item Parameters'* can now be used to drive listed in the table of FIGS. 4B and 4C with ID numbers from 

the rest of the process. The fourth task 38 is preferably 1 through 23 as indicated by the "P* in column 2. All of the 

performed in the production schedule as a prerequisite to the characteristics in the table are either "counts of data items 

creation or updating of the files that are to be verified. The with specific characteristics", or they are other "obseiva- 

assumption is made that the "production date*' is readily S lions"aboulthedata.Thisisindicatedby the code in column 

available on the system. If this is not the case, then the 3. To better understand the descriptions of the characteristics 

"production date** can easily be provided as part of the fourth as they appear in column 5, just add the phase "The number 

task of items containing" to the beginning of each phrase which 

The FA processing step as described earlier, is executed represents a count, 

for each file as soon as possible after the data set is created 30 Even if a data item is defined as "text" it may stiU be 

or updated. The FA processing step evaluates the aggregate numeric. Characteristics with ID numbers 3, 4, 11, 12, 14, 

statistics on each data item, and responds appropriately to 15, and 16, are also evaluated as numbers. In the case of 

the changes (or lack thereof as directed by the processing characteristic number 14, the data represented therein may 

control parameters. The FAprocessing step is run repeatedly, be an external numeric. 

once for each file that requires verification. In the case of 15 ^ ^ ^^U item is defined in the record layout as numeric, 

multiple record layouts in a file, it is run separately for each it can be evaluated for the diaracteristics listed in the table 

type of record that each record layout describes. of FIGS. 4B and 4C with ID numbers fi-om 24 through 37 

It is preferred that the FAprocessing step be run as soon as indicated by the "N" in column 2. 

as possible after the creation of the file, so that problems can Any text or numeric item can be evaluated as a date item, 

be identified immediately. Accordingly, if action needs to be Accordingly, in the case of a text item, testing is conducted 

taken, then it can be taken quickly in order to prevent the for the presence of a three letter month abbreviation, or a 

contamination of other systems. complete month name (this does require the adoption of 

Referring now to FIG. 4A, a data flow diagram further certain local "customs" with regards to the representation of 

detailing the FA processing step of the present method is dates, such as language and culture). If a month is identified 

shown. As can be seen, the FAprocessing step performs the ^ as the only alphabetic character element of a text string, the 

task 40 of accumulating statistics on file and data item remainder of the string can be evaluated for year and day 

characteristics by evaluating the aggregate statistics on each elements. 

data item and then performs the task 42 of comparing the Number data (whether defined as numeric or not) can be 

data item characteristics to previous statistics and reports on more easily evaluated as a date, working from the more 

the changes found. common formats such as "CCYYMMDD" (where CCYY is 

During the first task 40, as eadi file is read, the "Current the century and year, MM is the month, and DD is the day), 

Production Date", the "File Information Parameters" to less popular or partial date formats such as "YYJJJ" 

(produced by the FIM processing step), and the "Data Item (where JJJ is the day of the year), or "DDMM" (which omits 

Parameters" (produced by the DD processing step) are used the year entirely). Date characteristics are then recorded as 

to analyze the data. shown for characteristics with ID numbers 38 to 41. Only 

The "Current Production Date" serves, in part, to ensure those dates which conform to the most popular formats are 

that the "Data Items Parameters" have been finalized for the recorded. 

current production period. An override parameter is pro- it is entirely possible that a data item in a file may contain 

vided so that historical file analysis data can be added for the ^ items that exhibit text, number and date item characteristics, 

verification of files that are being added to the process for the However, when evaluating these counts and observations as 

first time. Prevwus period file analysis are not mandatory for a whole, a pattem will emerge for predominantly numeric, 

the initial verification. If omitted, they will accumulate over date, or text values. This finding is recorded as characteristic 

time. ID number 42. 

The data is analyzed using an algorithm to produce 45 Characteristics with ID numbers 43 and 44 represent the 

statistics on file and data item characteristics. The data item "group" properties of the domain type (the group charac- 

charactenstics provide relatively UtUe information. What the teristics are not dependent on the type of data). The first 350 

data Item charactenstics do provide, however, are: 1) a name unique values in each data item are recorded, and the 

by which the data item may be identified; 2) where the data number of instances of each value is counted, until the 351st 

Item IS located in each record; and 3) how to evaluate 50 unique value is encountered. If there are more than 350 

numenc data items. values in a data item, then characteristic ID number 43 is 

Other known characteristics about the data are combined flagged as "R" for range, the assumption being made that the 

with the data item parameter information from above, to domain of values in the data item is defined only as a value 

derive file and data item characteristic values from each file. in the observed range. However, if there are between 1 and 

FIGS. 4B and 4C show a table of statistics that arc ss 350 unique values, then the domain type is recorded as "E" 

collected in the first task, and reported on in the second task for enumerated domain". This information can identify 

of the FAprocessing step. As this table suggests, the process possible values of "code" information. The selection of the 

is fairly generalized allowing for new items to be added to number 350 is somewhat arbitrary and is motivated in part 

the table. Some of the items however are interdependent and by practical system limitations. However, the assignment of 

wiU require specific (i.e. non-generalized) processing. Each 60 "code" status to a variable, is also somewhat arbitrary. (The 

data item characteristic shown in the table of FIGS. 4B and term "enumerated domain" is a technical term and is not, 

4C is discussed hereinafter. The "Characteristic IDs" in therefore, arbitrary.) 

column 1 serve as a means of identifying iq)ecific data item Characteristics with ID numbers 45 and 46 record the 
characteristics. observed sequence of the contents of a data item, and the 
Chara:cteristic ID #0 is a count of the records in a file. Data 65 uniqueness of items that are in either ascending or descend- 
that is defined as "character" or "text (the terms are used ing order. The values for "sequence" are "R" for random, 
interchangeably here), is evahiated for the «^aracteristics "A" for ascending, "D" for descending or "N" for no 



02/24/2004, EAST Version: 1.4.1 



Level of importance 


Meaning 


Reporting Level 
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b 
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"Detailed Infonnation" 
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Significftnt 


"Significant Variations" 


d 


Serious 


''Serious Anomalies** 


e 


Probable error 


"Serious Anomalies'* 
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sequence (which is what you get when every occunence of 
an item has the same value). If the sequence is either "A" or 
''D'" then the item has either unique values or non-unique 
values. 

Referring again to FIG. 4A, history files are evaluated 
during the first task 40 of the FA processing step. In 
particular, the file characteristic history file, the character- 
istic count history file, the characteristic observations history 

file, and the code history files are evaluated. * ^u- *u «c a i ^» i. 

10 Anything reported on the "Senous Anomahes Report will 

With regard to the file characteristic history file, in addi- also be reported on the "Significant Variations Report". 

tion to the record count for each period, this file also records Everything will appear on the "Detailed Information 

the date and time when the analysis was begun, and the Report". Items that appear on the "Significant Variations 

number of data items defined for that file. Report" also receive a brief mention on the "Serious Anoma- 

lies I^eport" 

Ihediaracteristic count history file contains a record for "importance code" values in column 6 are for a 

every non-zero count characteristic as indicated by a "C "change code" value of "2". If the change code is a "3" the 

in column 3 of the table of FIGS. 4B and 4C. "importance code" is bumped up to the next level. The codes 

Tbe characteristic observations history file contains a in column 8 are dependem on the quantity code in column 
record for every "observation" characteristic as indicated by on ^ w ^i^l'^^ below). . , . , 

an "O" in column 3 of the table of FIGS. 4B and 4C. Each "^^^"^^ °^ ^^^S*^ ""'"^ charactenzed below: 

record of this file has a separate column for "number**, 

"text", and "data" observations, only one of which will be ^ ^ j * « * i. ■ .i. * 

..j'. Change Codes for Percent change m the proportion of 

populated dependmg on the data type. Observation types are Description "Degree of change** items with the specified characteristic 

noted in column 4 of the table of FIGS. 4B and 4C. Note that 25 — 

a number observation may be specified as a "count", as ^^^S® ? «^ ^ , v 

, . *u * 1 ^« L minor 1 greater than 0%, and less than 5% 

opposed to some other representative value. "Text" obser- ^ ^^^^^ 5^; ^ 

vations that are blank are not recorded. Number or date major 3 greater than &% 

observations that are zero are also not recorded. Dates are — — — 

always converted to the CCYYMMDD, 8 digit format for j, u ^Au a . a . u 1 ^ « j t. 

. » &• ju It should be understood, that change is only defined here 

^ ' for items with a non-zero count, or observation, in both 

The code history relates to those data items that have periods. Any change in a type "P' observation is coded as 

between 2 and 350 unique values, for which a code history a "2"*. If the quantity in both periods is "small" (see "quantity 

record is written. This is similar to the characteristic obser- codes" below) then the change code is set to 1 resulting in 

vations history file with the addition of a count field showing no special reporting of changes. Note that the "small quan- 

the frequency of occurrence of each value. tity" mle does not apply to non-count, numeric "observa- 

Referring still to FIG. 4A, the second task 42 performed ^'^Pf" since these arc not freqiiencies. 

in the FA processing step involves comparing the most ^he importance of a characteristic remaming completely 

recent statistics with previous statistics and reporting the unchanged from one penod to the next depends not only on 

changes found. More specificaUy the second task involves '° ^^P^ ^.^ characterisUc but also on the total amount (count 

reading the same files as in the first task, namely: the numencal observation) myolved. "Quantity codes" are, 

"Current Production Data"; the "File Information Param- "i^refore, charactenzed as follows: 

eters"; and the Data Item Characteristics"; but not the file 

being verified. The statistics of that file, having been written 
to the four "Data Characteristics" files (described above), are 

now read back into this step, and are compared to the 1 Small 

information from previous versions, in order to evaluate the ^ 

significance of any changes. ThLs step cai] also be performed 

separately on all or part of the recorded analysis in order to The definition of a "small" quantity(s) as a function of the 

update an online display of the file analysis infonnation as total is as follows: 
in a production verification "ticker". 

The results of this step are then written to the three report 1) if total <=50 then- s=50 

files of FIG. 1 which show: 1) detailed information; 2) 2) if totai>50 and <=20,ooo,qoo then: 

significant variations; and 3) serious anomalies. For the 55 s-950*((e**t- e''*-t)/(c**t+c»*-t)) + 5 

"Detailed Information Report", all available information is , '^^'^ '"(^^^ " 50)/i,ooG,ooo 

reported. TTie more difficult issue is to determine what levels 3) if total>io,ooo,ooo then: s-1,000 

of change are "significant" and what levels qualify as 

"anomalous". A "major quantity" is more than 20% of the total, going no 

Referring again to the table of FIGS. 4B and 4C three '° ^Z^' f j^^' , , , 

types of period to period comparisons are made: 1) charac- , Enumerated domam (a Jc.a. code changes relate to the 
teristics that have changed; 2) characteristics that have f''^' "^a any appearance of a new value, or disappearance of 
remained completely unchanged; and 3) the initial rvalue toat appeared 

appearance, or sudden disappearance of a characteristic. , <^w1d^^^ . °° "l! "P""' 7'^ 

65 WARNING IS chosen because this is not necessarily a 

The lowercase letters that appear in columns 6, 8, and 9 matter for alarm (the proportion of total values affected 

have the following meanings, and are reported as shown: would determine the seriousness of this occunence), but the 



45 "Quantity Codes" IVpc of amount 
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user must accommodate the change so as to protect the method of the present invention also identifies variations in 

referential integrity of the overall system. Changes in code file structure definitions. Furthermore, data is compared 

value frequency counts are assigned an importance code of across time even though the internal coding of the data may 

"d" for a degree of change of "2". have changed. In addition, the method of the present inven- 

Overflow or underflow early warning involves comparing 5 ^on can idbntify new and/or missing items in enumerated 

the three most recent versions of the data (including the new domains and can note variations in file sequence and 

period) to warn users that the data item definitions may be changes to the uniqueness of the "sort" item. The method 

becoming inadequate for the contents. Where the changes ^ provide an early warning for certain types of data 

have been consistent from period to period, the following item overflow or underflow and quantification of meaningful 

conditions may be detected. For text strings, the median lO ^^P^^ ^^^S^ ^ ^^^^ ^^^^ ^^^^ ^^n be 

string length may be approaching overflow. For numbers, ^° ^^^^ ^^^^ ^^^^^ different systems. Fmally, 

the maximum number may be growing too fast, or the values ^® method of the present invention centraHzes file processes 

may be running out of decimal positions. In the case of date ^^^^^^^ information in a manageable set of reports 

information, the year 2000 may be approaching too soon for ^^^^^ accessed in a way which is determined by the 

data items that cannot accommodate the century part of the 15 Processing timeline, and the likely significance of dianges in 
year. These conditions are indicated on the "Significant 

Variations Report". Numerous modifications to and altemative embodiments 

Any major change in a data item might be reflected as a P^^^^°^ invention wiU be apparent to those skilled in 

change in multiple characteristics for a single data item. In foregoing description. Accordingly, this 

order to avoid overwhelming the user with redundant 20 description is to be construed as iUustrative only and is for 

warnings, only the few most serious conditions need to be purpose of teaching those skilled in the art the best mode 

mentioned for a single item on the "Serious Anomalies carrying out the invention. Details of the invention may 

Report" (where brevity is of the essence). This is done by ^^^^ substantially without departing fi:om the spirit of 

mentioning only the most important anomalies for each data the invention and the exclusive use of all modifications 

type (as noted by the values in column 2 of the table of 25 "^^^ appended claims is 

FIGS. 4B and 4q. reserved. 

The order of importance among characteristic changes (or Y^^^^ ^ claimed is: 

lack of change) is as follows: 1) "appearance/ ^' ^ method for venfymg computer generated data in 

disappearance"; 2) "change"; and 3) "absence of change", periodically updated and replaced files to determine if data 

leaving the early warning of overflow conditions as the least 30 characteristics in said files have changed in an unex- 

important type of change. The "Change Code^ percentages ^^^^ ra^untt, said method comprising the steps of: 

and the codes in columns 6 through 9 of the table of FIGS. selecting a first version of each of said data item charac- 

4B and 4C can be modified to suit a particular entity's tenstics; 

specific needs. Additionally a "file specific" or "data item selecting a second subsequent version of each of said data 

specific" version of the parameters can be provided to 35 item characteristics, each of said second subsequent 

override the installation global parameters. The values versions of said data item characteristics being a new 

shown in the table of FIGS. 4B and 4C are default values. data item parameter; 

The change in record count is always noted on each of the analyzing said first version of each of said data item 

three reports. However, warning messages will only be characteristics and said second subsequent version of 

printed as appropriate for large changes. Where there was no 4q each of said data item characteristics to produce first 

previous version of the data set or of an item, a note should and second statistical profiles therefore; and 

be made on the reports but this is not indicated as a "serious comparing said first and second statistical profiles of each 

anomaly" since the change will already have been noted in of said data item characteristics to each other to deter- 

the "Data Item Change Report". mine if any of said data item characteristics have 

As stated earUer, the PM step monitors the verification 45 changed in an unexpected maimer or failed to change to 

process as it proceeds by posting information to a set of three an expected degree. 

online reports consisting of a serious anomahes report, a 2. The method according to c laim 1, further comprising 

significant variations report and a detailed information the step ofproviding identifying information for each of said 

report. The PM processing step is a manual operation that files to create a file information parameters file for each of 

involves "watching" the files being created or updated, and so said files prior to said step of selecting a first version of each 

determining whetiier or not the process is proceeding cor- of said data items, whereby said files can be compared 

rectiy. The information that is generated by the PM process- across time, and wherein said first version of said data items 

ing step described herein, provides monitoring information are selected firom said file information parameters file of 

on eadi file within a short time after each file has been each of said files. 

created or updated, thereby allowing for immediate follow- ss 3. The method according to claim 2, wherein said step of 

^P* providing identifying information includes the step of 

The monitoring can of course be started at any time after assigning identifying means to each of said files, 

the file processing has begun, but generally the sooner the 4. The method according to claim 3, wherein said step of 

problems are identified, the better. assigning identifying means includes the step of dating each 

As should now be apparent, the method of the present 60 of said identified files, 

invention provides the important benefit of enabling the 5. The method according to claim 3, wherein each of said 

verification of computer generated data on the basis of files includes record layouts and said step of assigning 

characteristics of the information itself. The method pro- identifying means includes the step of assigning identifying 

vides a means of reporting on the contents of files without information to said record layouts of each of said files, 

the need to define the structure of those files beyond that 65 6. The method according to claim 5, wherein said step of 

which has already been done in defining record layouts used compiling record layout information into a single file 

in the programs that read from, or write to these files. The inchides the steps of; 
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organiziag said record layout information into a listing of 

all of said record layouts; and 
matching each of said data items across time periods. 

7. The method according to claim 2, wherein said step of 
selecting a second subsequent version of each of said data 
items includes the step of compiling record layout informa- 
tion from said file information parameteis file of each of said 
files into a single file. 

8. The method according to claim 7, wherein said step of 
selecting a second subsequent version of each of said data 
items further includes the step of creating a new data item 
parameter file which includes said new data item parameters. 

9. The method according to claim 8, wherein said step of 
selecting a second subsequent version of each of said data 
items further includes the step of providing a data item 
parameter change report. 

10. The method according to claim 9, wherein said step of 
selecting a second subsequent version of each of said data 
items further includes the steps of: 

assigning said new data item parameters with a produc- 
tion date; and 

adding said new parameter data item to a data item 
parameter history. 

11. Hie method according to claim 8, wherein said 
method modifies its own processing parameters to accom- 
modate changes in file structure and organization without 
manual intervention. 

12. The method according to claim 2, wherein said step of 
analyzing includes the step of reading said file information 
parameters, said new data item parameters and a current 
production date file. 

13. The method according to claim 12, wherein said step 
of analyzing further includes the step of quantifying changes 
observed in said file information parameters, said new data 
item characteristics and said current production date file to 
produce said first and second statistical profiles. 

14. The method according to claim 13, wherein said step 
of analyzing further includes the step of reporting said first 
and second statistical profiles. 

15. The method according to claim 14, wherein said step 
of analyzing further includes the step of identifying the 
introduction of unexpected data characteristics. 

16. The method according to claim 13, wherein said step 
of analyzing includes the step of determining the signifi- 
cance of a lack of change of data item statistics over time. 

17. The method according to claim 14, wherein said step 
of analyzing further includes the step of determining the 
impending threat of one of overflow problems and under- 
flow problems including century overflow in date items, a 
lack of decimal precision, one of maximum or minimum 
numerical values approaching a limit of the containing data 
item, maximum string lengths approaching said limit of the 
containing data item, and table item overflow. 

18. The method according to claim 12, wherein said step 
of providing identifying information includes the step of 
determining the characteristics of said files using record 
layout information and the contents of said files under 
investigation. 

19. The method according to claim 18, wherein said 
record layout information comprises data item identification, 
data item length and position. 

20. The method according to claim 19, wherein said 
record layout information further comprises numerical data 
item storage conventions. 

21. The method according to claim 18, wherein said step 
of determining the characteristics of said files using record 
layout information and the contents of said files imder 
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investigation includes the step of determining data item 
characteristics while reading the file contents in a single 
pass. 

22. The method according to claim 21, wherein said step 
of determining data item characteristics while reading the 
file contents in a single pass includes the step of determining 
data item characteristics as to their validity as dates and 
numbers. 

23. The method according to claim 21, wherein said step 
of determining the characteristics of said files using record 
layout information and the contents of said files under 
investigation includes the step of determining data item 
characteristics according to the identification of enumerated 
domain data and variations in the domain over time. 

24. The metiiod according to claim 21, wherein said step 
of determining the characteristics of said files using record 
layout information and the contents of said files under 
investigation includes the step of identifying sort key com- 
ponent data items and determining whether said sort key 
data items are repeated or are unique. 

25. The method according to claim 21, wherein said step 
of determining the characteristics of said files using record 
layout information and the contents of said files under 
investigation includes the step of classifying data item and 
file characteristics in a structure that provides extensibility. 

26. The method according to claim 21, wherein said step 
of determining the characteristics of said files using record 
layout information and the contents of said files under 
investigation includes the step of determining data item 
characteristics according to the identification of enumerated 
domain data and variations in the domain over time. 

27. The method according to claim 21, wherein said step 
of determining data item characteristics while reading the 
file contents in a single pass includes the step of determining 
data item characteristics with regard to their validity as dates 
and numbers. 

28. The method according to claim 27, wherein said step 
of determining data item characteristics while reading the 
file contents in a single pass includes the step of determining 
the format of data items which represent date information. 

29. The method according to claim 27, wherein said step 
of determining data item characteristics while reading the 
file contents in a single pass includes the step of evaluating 
external numeric data so as to determine their numerical 
properties. 

30. The method according to claim 12, wherein said step 
of analyzing includes the step of analyzing files with mul- 
tiple record formats. 

31. The method according to claim 1, further comprising 
the step of monitoring said files being periodically updated, 
replaced and added to determine if said data item charac- 
teristics in said files have changed in an unexpected marmer. 

32. A method for verifying computer generated data in 
periodically updated and replaced files to determine if data 
item characteristics in said files have changed in an tmex- 
pected marmer, said method comprising the steps of: 

providing identifying information for each of said files to 
create a file information parameters file for each of said 
files, whereby said files can be compared across time; 

selecting a first version of each of said data item charac- 
teristics from each of said file information parameters 
files; 

compiling record layout information from said file infor- 
mation parameters file of each of said files into a single 
file; 

creating a new data item parameter file which includes 
said new data item characteristics; 
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selecting a second subsequent veision of each of said data 
item characteristics from said new data item character- 
istics file; 

analyzing said first version of each of said data item 
characteristics and said second subsequent version of ^ 
each of said data item characteristics to produce first 
and second statistical profiles therefor; 

comparing said first and second statistical profiles of each 
of said data item characteristics to each other to deter- 
mine if any of said data item characteristics have 
changed in an unexpected manner; and 

monitoring said files being periodically updated and 
replaced to determine if said data item characteristics in 
said files have changed in an unexpected manner. 

33. The method according to claim 32, wherein said step 
of providing identifying information includes the step of 
assigning identifying means to each of said files. 

34. The method according to claim 33, wherein said step 
of assigning identifying means includes the step of dating 
each of said identified files. 

35. The method according to claim 33, wherein each of 
said files includes record layouts and said step of assigning 
identifying means includes the step of assigning identifying 
information to said record layouts of each of said files. ^ 

36. The method according to claim 32, wherein said step 
of compiling record layout information into a single file 
includes the steps of: 

organizing said record layout information into a listing of 

all of said record layouts; and 3^ 
matching each of said data items across time periods. 

37. The method according to claim 32, wherein said step 
of selecting a second subsequent version of each of said data 
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items further includes the step of providing a data item 
parameter change report. 

38. The method according to claim 37, wherein said step 
of selecting a second subsequent version of each of said data 
items further includes the steps of: 

assigning said new data item characteristics with a pro- 
duction date; and 

adding said new parameter data item to a data item 
parameter history. 

39. The method according to claim 32, wherein said step 
of analyzing said first version of each of said data item 
characteristics and said second subsequent version of each 
of said data item characteristics includes the step of reading 
said file information parameters, said new data item char- 
acteristics and a current production dale file. 

40. The method according to claim 39, wherein said step 
of analyzing said first version of each of said data item 
characteristics and said second subsequent version of each 
of said data item characteristics further includes the step of 
quantifying changes observed in said file information 
parameters, said new data item characteristics and said 
current production date file to produce said first and second 
statistical profiles, 

41. The method according to claim 40, wherein said step 
of analyzing said first version of each of said data item 
characteristics and said second subsequent version of each 
of said data item characteristics further includes the step of 
reporting said first and second statistical profiles. 

* * * * He 
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